Toward integrating word sense and entity disambiguation into statistical machine translation

نویسندگان

  • Marine Carpuat
  • Yihai Shen
  • Xiaofeng Yu
  • Dekai Wu
چکیده

We describe a machine translation approach being designed at HKUST to integrate semantic processing into statistical machine translation, beginning with entity and word sense disambiguation. We show how integrating the semantic modules consistently improves translation quality across several data sets. We report results on five different IWSLT 2006 speech translation tasks, representing HKUST’s first participation in the IWSLT spoken language translation evaluation campaign. We translated both read and spontaneous speech transcriptions fromChinese to English, achieving reasonable performance despite the fact that our system is essentially text-based and therefore not designed and tuned to tackle the challenges of speech translation. We also find that the system achieves reasonable results on a wide range of languages, by evaluating on read speech transcriptions from Arabic, Italian, and Japanese into English.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation Improves Statistical Machine Translation

Recent research presents conflicting evidence on whether word sense disambiguation (WSD) systems can help to improve the performance of statistical machine translation (MT) systems. In this paper, we successfully integrate a state-of-the-art WSD system into a state-of-the-art hierarchical phrase-based MT system, Hiero. We show for the first time that integrating a WSD system improves the perfor...

متن کامل

Unsupervised Translation Disambiguation for Cross-Domain Statistical Machine Translation

Most attempts at integrating word sense disambiguation with statistical machine translation have focused on supervised disambiguation approaches. These approaches are of limited use when the distribution of the test data differs strongly from that of the training data; however, word sense errors tend to be especially common under these conditions. In this paper we present different approaches t...

متن کامل

Improving Statistical Machine Translation Using Word Sense Disambiguation

We show for the first time that incorporating the predictions of a word sense disambiguation system within a typical phrase-based statistical machine translation (SMT) model consistently improves translation quality across all three different IWSLT ChineseEnglish test sets, as well as producing statistically significant improvements on the larger NIST Chinese-English MT task— and moreover never...

متن کامل

Bootstrapping Phrase-based Statistical Machine Translation via WSD Integration

Beside the word order problem, word choice is another major obstacle for machine translation. Though phrase-based statistical machine translation (SMT) has an advantage of word choice based on local context, exploiting larger context is an interesting research topic. Recently, there have been a number of studies on integrating word sense disambiguation (WSD) into phrase-based SMT. The WSD score...

متن کامل

Word Sense Disambiguation for Statistical Machine Translation

While much effort has been put in designing and evaluating Word Sense Disambiguation (WSD) models for translation in the WSD community, standard Statistical Machine Translation (SMT) systems have achieved remarkable improvements in translation quality without modeling WSD explicitly. However, inspecting SMT output suggests that SMT needs better semantic modeling to accurately translate meaning....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006